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Abstract 

We  describe  a  technique  for  finding  pixelwise  correspondences  between  two  images  by  using  models  of  objects 
of  the  same  class  to  guide  the  search.  The  object  models  are  “learned”  from  example  images  (also  called 
prototypes)  of  an  object  class.  The  models  consist  of  a  linear  combination  of  prototypes.  The  flow  fields  giving 
pixelwise  correspondences  between  a  base  prototype  and  each  of  the  other  prototypes  must  be  given.  A  novel 
image  of  an  object  of  the  same  class  is  matched  to  a  model  by  minimizing  an  error  between  the  novel  image  and 
the  current  guess  for  the  closest  model  image.  Currently,  the  algorithm  applies  to  line  drawings  of  objects.  An 
extension  to  real  grey  level  images  is  discussed. 
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1  Introduction 

The  problem  of  image  correspondence  is  basic  to 
computer  vision  and  arises  in  a  number  of  vision  appli¬ 
cations  such  as  stereo  disparity,  object  recognition  and 
motion  estimation.  General  solutions  such  as  optical 
flow  techniques  for  computing  the  pixelwise  correspon¬ 
dences  between  two  images  only  work  when  the  dif¬ 
ferences  between  the  two  images  are  relatively  small. 
When  the  two  images  have  large  differences  such  as 
large  rotations  or  changes  in  shape,  then  general  meth¬ 
ods  for  computing  correspondences  break  down.  For 
many  applications,  prior  knowledge  is  available  about 
the  contents  of  the  images  for  which  the  correspon¬ 
dence  is  being  computed.  This  knowledge  may  be 
exploited  in  order  to  create  a  more  robust  correspon¬ 
dence  algorithm.  This  is  the  approach  discussed  in 
this  paper.  We  describe  an  algorithm  for  model-based 
matching  which  uses  a  simple  model  of  a  class  of  ob¬ 
jects  to  find  the  correspondence  between  a  novel  view 
of  an  object  of  the  same  class  and  a  standard  “proto¬ 
typical”  view.  Instead  of  using  3D  models  for  objects, 
we  build  models  from  2D  example  views  of  the  objects. 
Our  technique  requires  that  the  pixelwise  correspon¬ 
dences  between  each  example  view  and  the  standard 
example  view  be  given  by  the  user  (presumably  by 
semiautomatic  techniques)  in  the  training  stage.  Cur¬ 
rently  we  are  concerned  with  matching  line  drawings 
although  straightforward  extensions  should  allow  the 
algorithm  to  be  used  with  real  images.  Hence,  this 
paper  focuses  on  models  of  the  shape  of  objects  which 
do  not  take  into  account  their  textures. 

2  Related  work 

Other  researchers  have  studied  techniques  for  con¬ 
straining  the  search  for  correspondences  by  assuming 
a  model  for  the  form  of  valid  flow  fields.  For  example, 
Cootes  and  Taylor  ([5,  6,  7])  proposed  Active  Shape 
Models  (ASMs)  which  is  similar  to  the  approach  we 
are  taking.  An  ASM  is  built  by  first  manually  identify¬ 
ing  a  number  of  control  points  on  a  real  image  of  an  ob¬ 
ject.  After  the  same  control  points  are  identified  on  a 
number  of  different  images  of  the  same  object,  a  prin¬ 
cipal  components  analysis  is  done  on  the  matrix  con¬ 
sisting  of  vectors  of  control  points.  This  yields  a  set  of 
eigenvectors  which  describe  the  directions  (in  control 
point  space)  of  greatest  variation  along  which  the  con¬ 
trol  points  change.  An  ASM  is  then  the  linear  combi¬ 
nation  of  eigenvectors  plus  parameters  for  translation, 
rotation  and  scaling.  An  ASM  is  matched  to  a  novel 
image  of  the  object  by  an  algorithm  that  searches  a 
region  in  the  novel  image  around  the  current  position 
of  each  control  point  to  find  a  position  of  better  fit  for 
each  control  point  and  then  updates  the  parameters 
of  the  ASM  accordingly.  Two  of  the  main  differences 
of  their  approach  relative  to  ours  are  the  fitting  algo¬ 
rithm  used  (ours  is  a  gradient  based  approach)  and 
the  use  of  a  dense  pixelwise  flow  field  as  opposed  to 
a  sparse  vector  of  control  points.  Also,  Cootes  and 
Taylor  match  shape  models  (which  are  basically  line 
drawings)  to  real  images  whereas  we  match  line  draw¬ 
ings  to  line  drawings  and  also  describe  a  method  for 
matching  real  image  models  to  real  images. 

Another  group  of  researchers,  Bergen,  Anandan, 


Hanna  and  Hingorani  [1],  have  described  a  framework 
for  grey-level  motion  estimation.  Their  work  is  based 
on  defining  an  error  function  which  must  be  minimized 
to  find  the  optimal  flow  field  between  two  images.  The 
error  function  they  use  is  the  sum  of  squared  differ¬ 
ences  between  one  image  and  a  warping  of  the  other 
image  according  to  the  current  estimate  of  the  flow 
field.  Bergen  et  al.  constrain  the  flow  field  to  ad¬ 
here  to  some  preselected  form  or  model.  The  error 
is  then  minimized  with  respect  to  the  parameters  of 
the  model  by  the  Gauss-Newton  minimization  algo¬ 
rithm.  The  particular  model  used  to  constrain  the 
flow  can  be  selected  according  to  the  particular  appli¬ 
cation.  The  ones  discussed  in  Bergen  et  al.  are  rather 
general:  affine  flow,  planar  surface  flow,  rigid  body 
motion  and  general  optical  flow.  The  main  difference 
between  their  work  and  ours  is  the  type  of  model  used. 
Our  models  are  learned  from  examples  and  are  specific 
to  a  particular  object  class. 

The  main  motivation  for  our  work  is  the  linear  class 
concept  of  Poggio  and  Vetter  [11,  9]  that  justifies  mod¬ 
eling  an  object  in  terms  of  a  linear  combination  of  pro¬ 
totypes.  Poggio  and  Vetter  showed  that  linear  trans¬ 
formations  can  be  learned  exactly  from  a  small  set  of 
examples  in  the  case  of  linear  object  classes.  Further¬ 
more,  many  object  transformations  such  as  3D  rota¬ 
tions  of  a  rigid  object  and  changing  expression  of  a 
face  can  be  approximated  by  linear  transformations, 
that  can  be  learned  from  a  small  number  of  examples. 
The  same  motivation  underlies  the  work  of  Beymer  [2] 
who  describes  an  alternative  approach,  also  based  on 
a  linear  combination  of  prototypes,  to  vectorize  grey- 
level  images. 

3  Model-based  matching  using  proto- 
types 

3.1  The  model 

We  would  like  the  models  used  for  model-based 
matching  to  be  learned  from  examples  as  opposed  to 
being  hardwired.  To  learn  a  model,  a  number  of  exam¬ 
ples  or  prototypes  of  an  object  are  given  which  show 
how  the  object  can  change.  For  example,  to  learn  a 
model  of  a  face  with  varying  pose  and  facial  expres¬ 
sion,  several  examples  of  the  face  at  different  poses 
and  with  different  expressions  would  be  given  to  the 
system. 

In  addition  to  the  prototype  images,  we  require  that 
pixelwise  correspondences  be  given  between  one  of  the 
prototypes  (usually  the  “average”  prototype)  which  is 
chosen  to  be  the  base  image  and  each  of  the  other  pro¬ 
totypes.  In  practice  the  correspondences  are  specified 
by  the  user  during  this  “learning”  stage  in  a  semiau¬ 
tomatic  way  using  special  tools. 

Given  the  correspondences,  each  prototype  can  be 
“vectorized”  -  written  as  a  vector  of  points.  In  prac¬ 
tice  each  prototype  is  represented  as  two  matrices,  one 
with  the  displacements  in  the  x  direction  from  each 
point  in  the  base  image  to  the  corresponding  point  in 
the  prototype  and  one  with  the  y  displacements.  We 
define  a  model  in  this  framework  to  be  a  linear  combi¬ 
nation  of  vectorized  prototypes  or  equivalently  a  linear 
combination  of  example  flow  fields  (see  also  [10,  3]). 
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To  write  the  models  mathematically,  we  must  first 
introduce  some  notation.  Let  /o  be  the  base  prototype 
image  to  which  all  the  correspondences  reference.  Let 
N  be  the  total  number  of  prototypes.  Let  Dxi  be  the 
matrix  of  displacements  in  the  x  direction  mapping 
the  coordinates  of  base  image  /o  to  the  corresponding 
coordinates  of  prototype  L*.  Similarly,  let  Dyi  be  the 
matrix  of  y  displacements.  Together,  Dxi  and  Dyi 
make  up  a  flow  field.  The  model  images  consist  of  all 
images  whose  flow  field  is  a  linear  combination  of  the 
prototype  flow  fields  plus  an  affine  transformation.  In 
symbols, 


N—l 

Dx'  =  ^  ( CiDxi )  +  p0X  +  pi  Y  +p2 


l 


N—l 

Dy'  =  ^  ( CiDyi )  +  p3X  +  p4Y  +  p5 

i 

The  Dx'  and  Dy'  matrices  are  the  flow  field  de¬ 
scribing  model  image  V .  Each  row  of  the  constant 
matrix  X  is  (—w/2,—w/2  +  1, ...,  —  1,  0, 1, ...,  w/2  — 
l,ic/2)  where  w  is  the  width  of  the  prototype  im¬ 
ages.  Similarly,  each  column  of  the  constant  matrix  y 
is  (—h/ 2,  —  /i/2  +  1, ...,— 1,0, 1, ...,  ft/2  —  1,  h/2)T  where 
h  is  the  height  of  the  images. 

These  equations  describe  the  flow  fields  for  the 
model  images.  To  actually  get  the  grey  level  repre¬ 
sentation  of  it  is  necessary  to  warp  base  image  /o 
according  to  Dx'  and  Dy'  and  thereby  render  the  ma¬ 
trices  Dx'  and  Dy'  as  a  black  and  white  image.  If  the 
warp  function  simply  moves  pixels  in  the  base  image 
according  to  the  flow  field  (without  doing  any  blurring 
or  hole  filling)  then  a  model  image  can  be  written 

/'  (x  +  Dx'(x,  y),y  +  Dy1  (x,y))  =  I0(x,y). 

To  obtain  prototype  line  drawings  and  the  associ¬ 
ated  correspondences  in  practice,  a  drawing  program 
is  used.  A  model  of  a  new  object  is  made  by  first  creat¬ 
ing  a  line  drawing  of  the  base  image.  The  base  image  is 
usually  the  approximate  average  image  in  terms  of  the 
various  object  transformations  one  wants  to  represent. 
Next,  new  examples  of  the  object  are  drawn  by  chang¬ 
ing  the  lines  and  curves  of  the  base  prototype.  The 
pixelwise  correspondences  between  the  base  prototype 
and  each  additional  prototype  can  then  be  computed 
automatically  since  the  equations  describing  the  lines 
and  curves  in  each  prototype  are  known.  A  typical 
example  base  of  prototype  images  is  shown  in  figure 
1. 

3.2  Matching  novel  images 

Now  that  the  prototypes  have  been  defined,  we 
want  to  use  them  to  find  the  pixelwise  correspondence 
between  the  base  prototype  and  a  novel  image  that  is 
in  the  same  object  class  as  the  prototypes.  The  gen¬ 
eral  strategy  for  matching  the  novel  image  will  be  to 
define  an  error  between  the  novel  image  and  the  cur¬ 
rent  guess  for  the  closest  model  image  after  rendering 
it  and  then  try  to  minimize  this  error  with  respect  to 
the  linear  coefficients  c\  and  the  affine  parameters  pi. 


Following  this  strategy,  we  define  the  sum  of  squared 
differences  error 

E(C,P )  =  ly ynovel(x,y)  -  imodel(x,y)}2 

x,y 


where 

N- 1 

X  =  X  +  y^  CiDxi(x,  y)  +  p0x  +P1V+P2, 
i—  1 

N- 1 

y  =  y  +  E  CiDyi(x,  y)  +  p3x  +  pAy  +  p5, 

i—  1 

the  sum  is  over  all  pixels  (x,  y )  in  the  images,  Inovel  is 
the  novel  grey  level  image  being  matched  and  Imodel 
is  the  model  grey  level  image.  Assuming  the  simplest 
warping  function, 

Imodel(x,y)  =  I0(x,y). 

In  this  case,  the  error  can  be  written 

E{c,P)  =  \Y,\.inovel^v)-I^y)f 

x,y 


The  sum  of  squared  differences  error  depends  on  the 
model  parameters  and  gives  a  measure  of  the  distance 
between  the  novel  image  and  the  current  guess  for  the 
model  image.  Minimizing  the  error  yields  the  model 
image  which  best  fits  the  novel  image. 

In  order  to  minimize  the  error  function,  the 
Levenberg-Marquardt  algorithm  ([12])  is  used  (a  sim¬ 
ilar  use  of  Levenberg-Marquardt  is  described  in  [13]). 
This  algorithm  requires  the  derivative  of  the  error  with 
respect  to  each  parameter.  The  necessary  derivatives 
are  as  follows: 
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Figure  1:  A  typical  example  base  of  prototype  line  drawings. 
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Given  these  derivatives,  the  Levenberg-Marquardt 
algorithm  can  be  used  straightforwardly  to  find  the 
optimal  c  and  p.  Notice  that  the  algorithm  is  a  se¬ 
quence  of  vectorization  and  rendering  (through  warp¬ 
ing)  steps. 

3.3  Improving  performance 

Implementing  the  minimization  described  in  the 
previous  section  using  line  drawings  as  prototypes 
does  not  work  well  when  the  initial  model  parame¬ 
ters  are  far  from  the  optimal  ones.  There  are  a  couple 
of  standard  techniques  we  can  use  that  improve  the 
performance  of  the  matching  significantly. 

The  first  improvement  is  to  simply  blur  the  line 
drawings.  Since  only  the  black  pixels  are  important 
in  a  line  drawing,  a  blurring  algorithm  is  used  which 
only  blurs  the  black  pixels  onto  the  white  background. 
Using  blurred  line  drawings  makes  the  minimization 
more  robust  in  the  sense  that  the  initial  parameters 
can  be  much  further  away  from  the  optimal  ones  for 
the  minimization  to  succeed. 

The  second  improvement  is  to  use  a  coarse-to-fine 
approach.  This  is  a  standard  technique  in  computer 
vision  ([4]).  The  idea  is  to  create  a  pyramid  of  im¬ 
ages  with  each  higher  level  of  the  pyramid  containing 
an  image  that  is  one  fourth  the  size  of  the  one  below. 
The  flow  fields  must  also  be  subsampled,  and  all  x 
and  y  displacements  must  be  divided  by  2.  Levenberg- 
Marquardt  is  used  to  fit  the  model  parameters  start¬ 
ing  at  the  coarsest  level,  and  then  these  parameters 
are  used  as  the  starting  point  at  the  next  level.  The 


constant  affine  parameters  (p2  and  p$)  must  be  mul¬ 
tiplied  by  2  as  they  are  passed  down  the  pyramid  to 
account  for  the  increased  size  of  the  images. 

The  coarse-to-fine  approach  also  significantly  im¬ 
proves  the  robustness  of  the  matching.  When  com¬ 
bined  with  blurring,  the  matching  algorithm  works 
well  for  a  large  range  of  settings  of  the  initial  param¬ 
eters. 

A  stochastic  gradient  minimization  algorithm  (de¬ 
scribed  in  [14])  has  also  been  tried  in  place  of 
Levenberg-Marquardt.  It  was  found  to  be  much  faster 
(around  25  times)  and  more  robust  in  that  it  got 
caught  in  local  minima  less  frequently.  The  results 
reported  here  are  with  the  Levenberg-Marquardt  algo¬ 
rithm  because  the  stochastic  gradient  algorithm  was 
implemented  after  the  first  draft  of  this  paper. 

3.4  Pseudo  code 

The  following  pseudo  code  describes  the  matching 
algorithm. 

1.  Load  novel  image,  Inovel 

2.  Load  base  prototype,  /o,  and  flow  fields  for  the 
other  prototypes,  Dxi  and  Dyi 

3.  Create  image  pyramids  for  Inovel  and  /o  and  for 
each  Dxi  and  Dyi 

4.  Blur  all  images  in  novel  image  pyramid 

5.  Initialize  parameters  c  and  p  (typically  set  to  zero) 

For  each  level  in  the  pyramid 

6.  Estimate  the  parameters  c  and  p  using 
Levenberg-Marquardt 

When  computing  the  error  in  Levenberg- 
Marquardt,  the  model  image  is  created  by 
warping  /0  according  to  the  current  linear 
combination  of  prototype  flow  fields  plus 
affine  parameters  and  then  the  resulting 
model  image  is  blurred. 

7.  Multiply  the  constant  affine  parameters  P2  and 
p5  by  2 

8.  Go  to  next  level 

9.  Output  the  parameters 

3.5  Results 

Some  preliminary  tests  have  been  done  using  our 
approach  to  model-based  matching.  In  one  such  test, 
the  prototype  images  in  figure  1  were  used  to  create 
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Figure  2:  Results  of  matching  novel  images  using  the  prototypes  in  figure  1  .  The  novel  images  are  in  the  top 
row  and  the  model  images  which  were  estimated  are  in  the  bottom  row. 


a  model  of  simple  cartoon  faces.  The  pixelwise  corre¬ 
spondences  between  each  prototype  and  the  base  pro¬ 
totype  were  obtained  using  the  output  of  the  drawing 
program  on  which  the  images  were  generated.  The 
base  prototype  is  the  face  in  the  upper  left  corner  of 
figure  1. 

Novel  images  which  were  similar  to  those  in  the  ex¬ 
ample  base  were  created  by  hand.  These  images  were 
drawn  so  that  they  were  roughly  normalized  for  trans¬ 
lation,  scale  and  rotation.  Figure  2  shows  the  results 
of  fitting  the  model  to  the  novel  images.  The  top  row 
of  images  are  the  novel  images  and  the  bottom  row  are 
the  closest  model  images  as  estimated  by  the  match¬ 
ing  algorithm  described  above.  The  model  parameters 
were  all  initialized  to  zero,  which  means  the  base  pro¬ 
totype  was  used  as  the  starting  point  for  the  matching 
algorithm.  As  the  figure  shows,  the  algorithm  did  a 
good  job  of  finding  a  model  image  which  matched  well 
with  each  novel  image.  The  lines  in  the  model  images 
are  thicker  due  to  a  small  amount  of  blurring  that  is 
done  after  warping  in  order  to  fill  in  “holes”  left  by 
warping.  All  model  images  are  generated  from  their 
respective  flow  fields  by  warping  the  base  image. 

4  Extensions 

4.1  A  general  hierarchical  componentwise 
approach 

An  affine  transformation  is  included  in  the  model 
because  it  allows  for  the  novel  image  being  matched 
to  have  moderate  changes  in  scale,  rotation  and  trans¬ 
lation  from  the  model  prototypes.  In  other  words,  the 
affine  parameters  provide  some  extra  tolerance  in  the 
model.  Of  course,  the  affine  parameters  are  global  in 
the  sense  that  they  scale  the  whole  image  or  rotate 
the  whole  image  as  opposed  to  affecting  only  a  piece 
or  a  single  feature  of  the  image.  This  fact  exposes  one 
of  the  problems  with  the  approach  just  described.  It 
is  brittle  to  translations,  rotations  or  scaling  of  only  a 
single  feature  in  the  image  if  this  local  variation  is  not 
accounted  for  by  some  of  the  prototypes.  This  is  more 
of  a  problem  for  matching  novel  line  drawings  that  a 
user  has  complete  freedom  in  creating  than  with  real 
images  which  are  constrained  by  the  physical  world. 

One  obvious  solution  to  this  problem  is  to  use  a 
componentwise  approach  in  which  images  are  treated 


as  being  composed  of  several  different  components,  say 
eyes,  mouth  and  nose.  Each  component  would  have 
its  own  model  using  the  same  formulation  as  in  the 
previous  section.  In  other  words,  each  component  is 
specified  by  a  number  of  prototypes  along  with  the 
pixelwise  correspondences  for  each  prototype.  These 
components  are  then  combined  to  form  a  complete  im¬ 
age,  say  of  a  full  face,  by  specifying  where  each  compo¬ 
nent  can  be  located.  The  location  information  is  again 
specified  using  a  number  of  prototypes  for  the  whole 
image.  These  image  prototypes  would  simply  consist 
of  x,y  locations  for  each  component.  A  number  of 
image  prototypes  would  be  needed  to  show  how  each 
component  could  change  location  relative  to  the  other 
components.  The  new  componentwise  model  would 
be  a  linear  combination  of  location  vectors  as  well  as 
a  linear  combination  of  individual  component  proto¬ 
types. 

We  are  extending  this  componentwise  idea  towards 
a  potentially  powerful  hierarchical  framework  to  allow 
more  complicated  images  (with  possibly  multiple  ob¬ 
jects).  The  idea  is  to  build  components  from  a  linear 
combination  of  component  prototypes  and  then  build 
simple  objects  from  a  linear  combination  of  positions 
of  components  and  then  build  more  complicated  ob¬ 
jects  from  a  linear  combination  of  positions  of  simple 
objects  and  so  on. 

4.2  Using  real  images 

Another  ongoing  extension  to  this  work  is  to  apply 
the  matching  algorithm  to  real  grey  level  and  color 
images  as  opposed  to  black  and  white  line  drawings. 

In  this  case,  in  addition  to  modeling  the  shape  of 
objects,  we  also  model  the  texture  of  objects.  We 
model  texture  analogously  to  the  way  we  modeled 
shape  -  as  a  linear  combination  of  the  grey  level  val¬ 
ues  (texture)  of  the  prototype  images  (see  also  [2], 
for  an  alternative  approach  to  the  same  problem).  A 
rather  general  justification  of  models  of  shape  and  tex¬ 
ture  consisting  of  linear  combinations  of  prototypical 
shapes  and  textures  is  the  following.  Under  weak  as¬ 
sumptions,  one  can  prove  that  if  any  network  can  learn 
to  synthesize  shape  or  texture  from  examples  then  the 
desired  shape  or  texture  must  be  well  approximated 
by  a  linear  combination  of  the  examples  (see  [3,  8]). 

Let  {Ij}  be  the  set  of  prototype  images  where  Iq 
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is  the  base  image.  Define  DIj ,  the  image  of  intensity 
differences  between  Ij  and  /o,  as 

DIj(x,y)  =  Ij(x  +  Dxj(x,y),y  +  Dyj{x,y))-I0(x,y). 

For  any  (x,  y )  in  the  base  image,  the  corresponding 
model  point  is 

N—l 

Imode\x,y )  =  I0(x,y)  +  Y,  bjDIj(x,y) 

3  =  1 

where 

N—l 

X  =  X  +  Y  CiDxi(x, y)  +  p0x  +  piy  +  p2 

i—  1 

N—l 

y  =  y+Y  ciDVi(x^y)  +  p$x + p4y + p5. 

2=1 

In  other  words,  the  new  position  of  the  pixel  at  lo¬ 
cation  (x,y)  in  the  base  image  is  determined  by  a 
linear  combination  of  prototype  positions  (given  by 
Dxi(x ,  y)  and  Dyi(x,  y)),  and  the  new  grey  level  value 
of  the  pixel  is  determined  by  a  linear  combination  of 
prototype  grey  level  values  for  that  pixel.  The  two  lin¬ 
ear  combinations,  for  shape  and  texture  respectively, 
use  the  same  set  of  prototype  images  but  two  different 
sets  of  coefficients. 

To  match  a  novel  grey  level  image,  we  can  still  use 
Levenberg-Marquardt.  The  minimization  is  now  with 
respect  to  the  vector  of  grey  level  coefficients  b  as  well 
as  to  c  and  p. 

5  Applications 
5.1  Image  analysis 

One  problem  that  model-based  matching  can  be  ap¬ 
plied  to  is  the  problem  of  image  analysis.  By  image 
analysis  we  mean  the  problem  of  determining  certain 
parameters  describing  an  image  such  as  the  pose  or  ex¬ 
pression  parameters  of  an  image  of  a  face  for  example. 
Our  approach  to  image  analysis  is  to  learn  a  mapping 
from  images  to  their  corresponding  parameters  (see 
[3]).  The  representation  used  for  the  images  is  critical 
in  this  approach.  For  example,  trying  to  find  a  map¬ 
ping  from  the  raw  grey  level  matrix  of  an  image  to  its 
associated  parameters  would  not  result  in  a  mapping 
which  generalized  to  new  images.  This  is  because  the 
grey  level  values  of  an  image  do  not  change  smoothly 
as  the  objects  in  the  image  change  smoothly.  Instead 
of  using  the  grey  level  representation,  Beymer  et  al. 
find  the  pixelwise  correspondences  for  each  example 
image  and  use  the  vector  of  labelled  points  for  each 
image  as  the  image  representation.  They  call  the  vec¬ 
tor  of  labelled  points  the  “vectorized”  representation 
of  an  image.  Thus  to  analyze  a  new  image,  it  must  first 
be  converted  into  the  vectorized  representation.  To  do 
this  we  can  use  the  model-based  matching  approach 
previously  described  instead  of  other  techniques  such 
as  optical  flow.  Thus,  our  approach  to  image  analysis 
is  to  first  define  a  model  as  described  in  section  3.1 


from  a  set  of  prototype  images  and  their  flow  fields. 
The  analysis  parameters  (such  as  pose)  are  also  given 
for  each  prototype.  A  mapping  is  then  learned  which 
maps  the  vectorized  prototypes  to  their  corresponding 
analysis  parameters.  A  novel  image  is  analyzed  by  first 
matching  the  linear  combination  of  prototypes  model 
to  the  image  as  described  in  section  3.2.  After  match¬ 
ing,  the  resulting  correspondences  are  used  to  create 
the  vectorized  representation  for  the  novel  image.  The 
parameters  of  the  novel  image  are  then  calculated  by 
applying  the  previously  learned  mapping  to  the  vec¬ 
torized  representation  of  the  novel  image  as  described 
in  [3]. 

As  described  briefly  in  section  3.5,  we  have  written 
a  system  for  analyzing  line  drawings  such  as  those  in 
figure  1.  The  system  learns  to  analyze  sketches  from 
a  user  who  trains  the  system  with  prototype  exam¬ 
ples.  The  system  is  first  trained  with  prototypes  of 
line  drawings  of  an  object  along  with  the  pixelwise  cor¬ 
respondences.  Given  a  set  of  prototypes,  the  system 
attempts  to  match  a  novel  line  drawing  which  is  ap¬ 
proximately  in  the  space  of  images  spanned  by  the  pro¬ 
totypes  using  the  algorithm  of  section  3.4.  The  model 
parameters  which  are  found  by  the  matching  can  be 
used  as  the  analysis  parameters  for  the  image.  Alter¬ 
natively,  the  model  parameters  can  be  mapped  by  an 
approximation  network  to  a  possibly  higher  level  set 
of  analysis  parameters  (see  [3] .  Examples  of  the  higher 
level  parameters  would  be  given  with  each  prototype. 

5.2  Man-machine  interface 

Image  analysis  can  be  used  to  build  a  general  man- 
machine  interface  or  a  gesture  recognition  system  ([3]). 
For  example,  if  a  model  of  a  hand  were  built  from  ex¬ 
ample  views  of  a  hand  then  novel  views  of  a  hand  could 
be  analyzed  to  recover  their  position  and  orientation. 
These  parameters  could  then  be  used  as  input  to  a 
computer  to  control  things  the  same  way  a  3D  mouse 
does.  Other  possibilities  for  a  man-machine  interface 
are  analyzing  facial  expression  and  using  it  as  input 
to  the  computer. 

Other  potential  applications  for  model-based 
matching  are  object  recognition,  very  low  bandwidth 
teleconferencing  and  virtual  reality  simulations. 

6  Discussion  and  conclusions 

We  have  described  a  robust  algorithm  for  model- 
based  matching.  Using  object  models  to  guide  the 
matching  algorithm  may  be  essential  in  cases  where 
the  differences  between  two  images  of  an  object  are 
too  great  for  a  general  correspondence  algorithm  to 
work  well.  The  need  for  prior  knowledge  in  the  form 
of  object  models  comes  from  the  fact  that  optical  flow 
is  an  underconstrained  problem  although  other  ways 
of  adding  constraints  have  of  course  been  used  (see  for 
example  [1,  13]). 

The  linear  combination  of  prototypes  model  that 
we  described  has  several  advantages.  It  is  a  simple 
learning-from-examples  model  that  only  requires  2D 
views  as  opposed  to  a  3D  model.  It  has  a  quite  deep 
motivation  since  the  linear  combination  of  prototypes 
model  is  intimately  related  to  general  properties  of 
a  very  broad  class  of  synthesis  networks  of  the  type 
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described  by  [3].  A  new  model  is  fairly  simple  to  cre¬ 
ate  since  all  that  is  required  are  a  number  of  example 
views  of  the  object  class  and  the  pixelwise  correspon¬ 
dences  for  each.  Most  importantly,  the  matching  al¬ 
gorithm  works  well  in  practice.  One  problem  with 
this  approach  is  the  need  for  the  correspondences  for 
each  prototype.  In  general  we  expect  that  once  a  good 
vocabulary  of  models  is  created,  new  models  will  not 
need  to  be  created  very  often. 
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